104 research outputs found
Extending Word-Level Quality Estimation for Post-Editing Assistance
We define a novel concept called extended word alignment in order to improve
post-editing assistance efficiency. Based on extended word alignment, we
further propose a novel task called refined word-level QE that outputs refined
tags and word-level correspondences. Compared to original word-level QE, the
new task is able to directly point out editing operations, thus improves
efficiency. To extract extended word alignment, we adopt a supervised method
based on mBERT. To solve refined word-level QE, we firstly predict original QE
tags by training a regression model for sequence tagging based on mBERT and
XLM-R. Then, we refine original word tags with extended word alignment. In
addition, we extract source-gap correspondences, meanwhile, obtaining gap tags.
Experiments on two language pairs show the feasibility of our method and give
us inspirations for further improvement
Identifying and Utilizing the Class of Monosemous Japanese Functional Expressions in Machine Translation
PACLIC 23 / City University of Hong Kong / 3-5 December 200
Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy
PACLIC 23 / City University of Hong Kong / 3-5 December 200
Analysing features of Japanese splogs and characteristics of keywords
This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of key-words contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually exam-ine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various infor-mative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers
Open-source Software for Developing Anthropomorphic Spoken Dialog Agents
An architecture for highly-interactive human-like spoken-dialog agent is discussed in this paper. In order to easily integrate the modules
of different characteristics including speech recognizer, speech synthesizer, facial-image synthesizer and dialog controller, each module
is modeled as a virtual machine that has a simple common interface and is connected to each other through a broker (communication
manager). The agent system under development is supported by the IPA and it will be publicly available as a software toolkit this year
- …